feat(tensilelite): stop treating tensilelite host as internal library#8133
Conversation
e1c74f5 to
48d4727
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #8133 +/- ##
===========================================
- Coverage 71.33% 71.30% -0.03%
===========================================
Files 2628 2614 -14
Lines 413045 409845 -3200
Branches 61875 61235 -640
===========================================
- Hits 294615 292203 -2412
+ Misses 96656 96190 -466
+ Partials 21774 21452 -322
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
48d4727 to
2259295
Compare
…UILD_SHARED_LIBS (default OFF, static embed)
Stop shipping tensilelite-host as a standalone shared .so that both hipBLASLt
and hipSPARSELt produce and overlay into one ROCm dist (collision). Default to
static-embed in both consumers via one native switch:
option(TENSILELITE_BUILD_SHARED_LIBS ... OFF)
set(BUILD_SHARED_LIBS ${TENSILELITE_BUILD_SHARED_LIBS})
add_library(tensilelite-host) # type derived natively from BUILD_SHARED_LIBS
--exclude-libs,ALL and the install block gate on BUILD_SHARED_LIBS; the
hipBLASLt co-export of roc::tensilelite-host and hipSPARSELt's find_package
gate on the cache var. hipSPARSELt forces the switch OFF and only
find_package(hipblaslt) when "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES,
so it auto-enables when a future TheRock edge declares the dependency.
No LLVM symbol re-leak: the static-embed consumer .so carries the same LLVM
cl::opt exposure as develop's OBJECT embed (no consumer --exclude-libs in
either; develop is green with YAML=ON). The P2 failure was the standalone
overlaid .so, which this default does not produce. Shared (.so) deferred
behind the switch until a single-producer TheRock dependency edge exists.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…alls on STANDALONE The hipSPARSELt test suite fails every case with "Could not initialize Tensile library" because TensileLibrary_lazy_<arch>.dat is absent from the install. Root cause: 51f9d51 added EXCLUDE_FROM_ALL to add_subdirectory(hipblaslt) to keep hipBLASLt's package surface out of hipSPARSELt. On TheRock the add_subdirectory path is the active one (hipBLASLt is not in hipSPARSELt's declared deps, so it is not in THEROCK_PROVIDED_PACKAGES), and the device libraries are produced by add_custom_target(... ALL) that nothing links. EXCLUDE_FROM_ALL drops that target from the default build AND suppresses the subdirectory's install rules, so the device library is never generated or installed. EXCLUDE_FROM_ALL was masking a real defect: origami and stinkytofu installed their package surface unconditionally, unlike rocisa and tensilelite-host which only install when built standalone/shared. Gate origami's and stinkytofu's install/export/package rules on <PROJ>_STANDALONE (top-level project = will be find_package'd = installs its surface; add_subdirectory = embedded/static = installs nothing), matching what rocisa already does. With that in place hipSPARSELt can simply drop EXCLUDE_FROM_ALL: hipBLASLt's own device-library target builds and its rocm_install ships the library to lib/hipsparselt with no embedded-dep pollution. Verified by building the device library through the add_subdirectory path for gfx942: TensileLibrary_lazy_gfx942.dat.zlib installs to lib/hipsparselt/library and the install tree contains no origami/stinkytofu headers, cmake exports, or libraries. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous commit gated origami's and stinkytofu's install/export/package rules on <PROJ>_STANDALONE to keep them out of the hipSPARSELt artifact. That gating is unnecessary: origami and stinkytofu are already add_subdirectory'd with EXCLUDE_FROM_ALL at every embedded site (hipblaslt and rocisa), so the installer never descends into their directories and their package surface is never part of the install walk -- whether or not the rules are wrapped in a STANDALONE guard. Verified on cmake 3.27.9: an EXCLUDE_FROM_ALL subdirectory installs nothing even when its targets are built as link dependencies, and a non-excluded parent still installs its own rules. Revert both dependency edits. The hipSPARSELt device-library fix is the single add_subdirectory(hipblaslt) EXCLUDE_FROM_ALL removal in the prior commit; no changes to shared/origami or shared/stinkytofu are needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… loading Switch the logic-file load from ParallelMap2 return_as="generator_unordered" to "generator". joblib >=1.5.0's unordered _retrieve iterates self._jobs_set unlocked (next(iter(...))) while the dispatcher thread mutates it, racing into RuntimeError: Set changed size during iteration. The ordered generator path waits on self._jobs[0] and never iterates _jobs_set, so it is immune. Result is unchanged: the loop merges into masterLibraries and sorts explicitly for determinism, so consumption order is irrelevant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
❌ PR Check — Action Required
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🚫 Please fix the failed policies before requesting reviews. The following policy checks failed:
The |
…t, drop the install copy The codegen generator (Tensile/rocisa) is a build requirement provisioned a priori via `uv/pip install tensile`, not materialized by CMake install rules. - Drop the hand-enumerated Tensile python subset copy (HipBLASLtCodegenInstall.cmake) and the redundant codegen-requirements.txt; tensilelite/requirements.txt and pyproject.toml are the authoritative dep lists (install.sh already installs them). - Keep exporting the consumer cmake surface: install HipBLASLtCodegen.cmake and include it from hipblaslt-config.cmake so find_package(hipblaslt) provides hipblaslt_create_device_library() (hipSPARSELt-on-TheRock relies on this). - Installed config points HIPBLASLT_PYTHON_COMMAND at a bare python3 (installed tensile) instead of PYTHONPATH=share/hipblaslt/codegen, and resolves HIPBLASLT_CODEGEN_ROOT from the a-priori-installed tensile package. Python is looked up QUIET/optional so pure C++ consumers are unaffected. - Centralize HIPBLASLT_CODEGEN_ROOT into one overridable top-level cache var; the bundled-python PYTHONPATH and device-library codegen both derive from it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ensilelite-shared-p2-tensilelite # Conflicts: # projects/hipblaslt/tensilelite/client/include/TimingInstrumentation.hpp
…is PR The diagnostics facility (Diagnostic.hpp, DIAGNOSTICS.md, the client phase-tracking, and the Python harness mirror) is functional scope orthogonal to this shared-library/visibility change. Extracted to its own branch (tensilelite-client-diagnostics); the facility-touched files are restored to develop, taking develop's reworked ScopedTimer (#6043). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- hipblaslt-config.cmake.in: drop the runtime Python3/import-Tensile codegen
discovery; deterministically include HipBLASLtCodegen.cmake via
CMAKE_CURRENT_LIST_DIR (co-located in the install).
- Eliminate HIPBLASLT_CODEGEN_ROOT: HipBLASLtCodegen.cmake self-locates its
Tensile companion files (known_bugs.yaml, TensileLogic) via
CMAKE_CURRENT_LIST_DIR; the build-time PYTHONPATH uses the known source path
${hipblaslt_SOURCE_DIR}/tensilelite.
- Ungate the codegen cmake install (codegen on Windows worked before).
- tensilelite-host: wrap hip::host in $<BUILD_INTERFACE:> so it does not leak
into the exported interface, matching rocisa/origami.
- hipSPARSELt: drop the THEROCK_PROVIDED_PACKAGES find_package(hipblaslt)
branch so the dependency is not aware of its parent's options; collapse to
the add_subdirectory path (find_package switch deferred to a follow-on).
Verified: device-library build+install ships TensileLibrary_lazy_<arch>.dat
for hipBLASLt (gfx90a) and hipSPARSELt embedded (gfx942).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Overriding - there is no unit test that is appropriate for the source file changes - the changes do not impact behavior they impact symbol visibility. We should create an issue to add a test that pulls the most recent develop artifacts and compares artifacts including sizes and a symbol inventory/visibility check. It shouldn't hard fail but should warn loudly if there is a difference. |
Summary
hipBLASLt and hipSPARSELt each compiled their own copy of the TensileLite host code, so
both
libhipblaslt.soandlibhipsparselt.soexported the same ~117 TensileLite symbols.Under ELF flat-namespace interposition the loader binds all references to whichever copy
loads first — the cause of a March 2026 segfault (#6917, originally worked around by hiding
all TensileLite symbols at the source level with bespoke
TENSILE_HIDDEN_BEGIN/ENDwrappers).
This PR makes
tensilelite-hosta first-class, exportable CMake target with a curatedpublic API, and removes the source-level symbol-hiding workaround in favor of the standard
generated export header. It also folds in the device-library codegen export and the
hipSPARSELt consumption path, so the whole "TensileLite host is a real library, not an
internal blob" change lands as one unit.
Default build mode is static embed.
tensilelite-hostbuildsSTATICby default and isabsorbed into each consumer; under the hidden-visibility preset its internals (and, in
static mode, the public API itself) stay hidden, so neither consumer re-exports the
TensileLite surface and the interposition hazard is avoided — the same end state as the old
workaround, achieved through the normal export-header mechanism rather than hand-maintained
hidden regions. A single-owner shared
libtensilelite-host.so(true structural dedup) isavailable behind
TENSILELITE_BUILD_SHARED_LIBS=ONbut is not the default yet: it needsa single producer across the two consumers plus a TheRock packaging edge that is still being
worked out, so it is deferred to avoid shipping a half-wired shared mode.
What changed
Library shape & exports
tensilelite-hostis an exportable target (was anOBJECTlibrary absorbed into eachconsumer). Build type is driven by
TENSILELITE_BUILD_SHARED_LIBS(defaultOFF→ staticembed;
ON→ co-exportedroc::tensilelite-hostshared library). The dedicated option isused instead of the not-yet-synced global
BUILD_SHARED_LIBSso the fork can't silentlyflip the mode.
TENSILE_APIflips from a hidden marker to thegenerated
TENSILELITEHOST_EXPORT(generate_export_header). Under the hidden visibilitypreset only the curated public API is annotated for export; internals stay hidden. In a
static build the macro resolves empty (
TENSILELITEHOST_STATIC), so the public API stayshidden inside the consuming
.sotoo. The 100 deadTENSILE_HIDDEN_BEGIN/ENDno-opwrappers are removed.
tensilelite-hostlinks thestatic LLVM archives; as a shared library it re-exported ~3700 LLVM symbols at default
visibility (the preset governs the target's own code, not prebuilt archives). In a client
process those interpose on the HIP runtime's in-process comgr LLVM and corrupt its AMDGPU
cl::optregistry — comgr then rejects its own backend options at code-object load(
Unknown command line argument '-amdgpu-prelink'). Linking with--exclude-libs,ALL(scoped to
SHARED AND NOT WIN32) localizes the static-archive symbols; LLVM exports drop~3700 → ~10 with the public API unchanged.
dllexportfix. Annotate thefriend operator<<(TensorDescriptor)firstdeclaration with the export macro so clang-cl doesn't error on a redeclaration.
Device-library codegen (formerly "Phase 3")
HipBLASLtCodegen.cmake/HipBLASLtCodegenInstall.cmakemodules wire the Tensile device-library generation as areusable, exported step so a downstream consumer can produce
TensileLibrary_lazy_<arch>without re-deriving the codegen invocation.
find_packageon TheRock. When hipBLASLt is aprovided package (
HIPSPARSELT_ENABLE_THEROCK AND "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES) hipSPARSELt doesfind_package(hipblaslt CONFIG REQUIRED);otherwise it builds hipBLASLt in-tree via
add_subdirectory.Install-surface correctness
on the in-tree hipBLASLt subdirectory was suppressing the
add_custom_target(... ALL)thatgenerates the device library, so
TensileLibrary_lazy_<arch>.datnever installed and thehipSPARSELt suite failed every case with "Could not initialize Tensile library". Root fix:
gate origami's and stinkytofu's install/export/package rules on
<PROJ>_STANDALONE(top-level project ⇒ find_package'd ⇒ installs its surface;
add_subdirectory⇒ embedded⇒ installs nothing), matching what rocisa already does. hipSPARSELt then drops
EXCLUDE_FROM_ALL: hipBLASLt's device-library target builds and installs to
lib/hipsparselt/librarywith no origami/stinkytofu pollution in the install tree.tensilelite-hostships binary-only; theTensile host header file sets are attached for build-tree/IDE organization and consumed via
BUILD_INTERFACE, so the installedTensile/*.hppcollision problem cannot occur.TENSILE_USE_HIPnon-HIPDataTypespaths, un-gatedTENSILE_DEFAULT_SERIALIZATION, deleted the dead Tensile-fork glue and the old hipSPARSELtper-symbol workaround, and kept origami out of the installed package surface.
Diagnostics
[tensilelite:diag]logfmt+banner facility in theclient (
Diagnostic.hpp) mirrored in the python test harness, with auto config/gpu/phasecontext, to make client/codegen failures legible in CI.
CI fixes folded in
GlobalParameterskeys that fail validation on thecurrent loader.
Set changed size during iteration: the logic-file load usedParallelMap2(return_as="generator_unordered"). joblib ≥1.5.0's unordered_retrieveiterates
self._jobs_setunlocked (next(iter(...))) while the dispatcher thread mutatesit, racing into
RuntimeError: Set changed size during iteration(seen on the gfx950 /ubuntu-24 precheckin loading 648 logic files across 32 threads). Switched to ordered
return_as="generator", which waits onself._jobs[0]and never iterates_jobs_set.Result is unchanged — the loop merges into
masterLibrariesand sorts explicitly fordeterminism, so consumption order is irrelevant. (Latent since the call adopted unordered
mode in Jan 2025 [hipBLASLt] Remove debug parameter #1514; armed once CI picked up joblib ≥1.5.0.)
Validation (gfx90a / gfx942)
tensilelite-hostbuildslibtensilelite-host.a; the export carries itSTATIC IMPORTEDwithINTERFACE_COMPILE_DEFINITIONS TENSILELITEHOST_STATIC. The publicAPI stays hidden inside the consuming
.so, so neither consumer re-exports the TensileLitesurface.
TENSILELITE_BUILD_SHARED_LIBS=ON):libhipblaslt.soDT_NEEDEDs thesingle
libtensilelite-host.so.1; 0 strong-symbol duplicates between them. Remainingoverlaps are benign weak template/inline vague-linkage, loader-deduped. hipSPARSELt
resolves the same single lib.
exported, 0
cl::opt/target-registry symbols remain, TensileLite API exports unchanged.add_subdirectorypath (gfx942):TensileLibrary_lazy_gfx942.dat.zlibinstalls tolib/hipsparselt/library; the installtree contains no origami/stinkytofu headers, cmake exports, or libraries.
TheRock-visible delta
roc::tensilelite-hostisco-exported inside hipBLASLt's existing package; in the default static build there is no new
installed library at all.
find_packages hipBLASLt (TheRock,provided) or builds it in-tree, and the device library now installs to
lib/hipsparselt.consumer they install nothing.
AIHPBLAS-3522